Goto

Collaborating Authors

 ieee conf








Self-supervised Object-Centric Learning for Videos Görkay Aydemir

Neural Information Processing Systems

From these temporally-aware slots, the training objective is to reconstruct the middle frame in a high-level semantic feature space. We propose a masking strategy by dropping a significant portion of tokens in the feature space for efficiency and regularization. Additionally, we address over-clustering by merging slots based on similarity.